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ABSTRACT 

We  present  a  simple  model  network  example  which  demonstrates 
unstable  behavior  when  traffic  is  directed  according  to  routing  optim- 
ized for  minimal  delay  and  the  load  varies  at  a  rate  comparable  to  the 
routing  calculation  time.  The  instability  can  be  avoided  by  using 
ahnost  any  alternate  design  which  avoids  the  knees  on  the  delay 
curves,  e.g.  a  Maximum  Entropy  Method  design.  The  delay  penalty 
in  this  case  turns  out  to  be  small. 


1.   Introduction 

In  teaching  about  network  performance  it  is  sometimes  difficult  to  convince  stu- 
dents of  some  of  the  counter-intuitive  facts  of  network  routing.  In  this  note  we 
present  a  simple  example  of  the  instability  which  can  resuh  from  too  serious  an 
attempt  at  optimal  bifurcated  routing  on  a  network  with  changing  offered  load.  The 
example  is  given  first  in  the  form  of  a  three  node  network  of  identical  lines,  and 
then  in  the  form  of  a  network  providing  multiple  node-disjoint  muhi-hop  paths  of 
unequal  capacity. 

In  a  large  network  there  can  be  a  significant  benefit  in  avoiding  poor  choices  of 
routing.  When  a  high  speed  path  of  very  few  hops  is  available,  it  would  seem  to  be 
sheer  folly  to  send  any  traffic  along  slow  multi-hop  back-door  paths.  However,  the 
addition  of  traffic  to  a  route  removes  some  of  its  capacity,  and  the  next  set  of  mes- 
sages might  well  be  better  sent  along  an  alternate  path.  The  assignment  of  a  set  of 
alternate  paths  with  an  allocation  of  portions  of  the  offered  load  among  them  gives 
rise  to  the  bifurcated  routing  problem.  Under  reasonable  constraints  it  is  possible 
to  find  routings  which  optimize  an  appropriate  payoff  function,  e.g.  average  end-to- 
end  delay.    (See  [7]  and  [6].)    Such  calculations  can  be  very  time-consuming.    A 
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node  in  the  network  may  not  be  able  to  wait  for  a  grand  plan  from  a  central  routing 
authority  to  decide  where  to  send  the  next  message.  In  that  case  it  may  be  desirable 
to  use  a  simple  distributed  shortest-path  algorithm  which  allows  each  node  to  esti- 
mate the  optimal  current  path  for  the  next  burst  of  traffic.  Schwartz  [11,  chapter  6] 
gives  a  review  of  the  common  distributed  dynamic  routing  techniques,  and  notes  the 
necessary  relationship  between  the  diameter  of  a  network  and  the  number  of  itera- 
tions needed  for  convergence.  Such  algorithms  normally  do  not  give  bifurcated 
routing  directly,  since  they  try  for  a  "shortest"  path.  However,  if  the  definition  of 
length  of  a  path  is  total  delay  along  it,  and  if  current  traffic  is  properly  accounted, 
then  one  can  expect  that  traffic  would  be  diverted  from  overloaded  paths  which 
were  once  sensed  as  providing  minimal  delay,  providing  a  somewhat  oscillatory 
approximation  to  optimal  bifurcated  routing. 

There  are  problems  with  the  distributed  algorithms.  Kleinrock  [10]  pointed  out 
that  "...  uncontrolled  alternate  routing  in  a  congested  net  can  lead  to  chaos.  Indeed, 
the  telephone  company  tends  to  limit  (and  even  prohibit  completely)  alternate  rout- 
ing on  unusually  busy  days  (Mother's  Day,  for  example)."  As  Schwartz  notes  of  a 
common  shortest  path  algorithm,  "Although  convergence  to  the  shortest  path  is 
guaranteed,  routing  table  entries  may  change  during  the  convergence  period,  giving 
rise  to  possible  loops  during  that  interval,"  [11,  p  277].  Even  if  one  suppresses  the 
creation  of  loops,  there  can  be  serious  problems.  When  the  offered  load  on  which 
routing  calculations  are  being  done  varies  significantly  on  time-scales  commensurate 
with  the  convergence  period  of  the  routing  algorithm,  one  has  created  a  feedback 
control  system  which  can  oscillate  for  very  long  periods.  The  reader  is  referred  to 
standard  texts  on  Control  Theory,  e.g.  [1  ]. 

In  this  note,  we  present  a  simple  example  of  the  type  of  instability  which  can 
result  from  computing  an  optimal  bifurcated  routing  for  a  load  which  changes  on  the 
time-scale  of  the  calculation.  While  the  example  was  created  to  clearly  demonstrate 
the  sub-optimal  results  of  optimal  routing  in  this  case,  it  is  not,  in  our  opinion  and 
observation  of  the  Internet  ,  unrealistic. 

As  a  contrast  to  optimal  routing,  we  will  mention  routing  found  by  the  Max- 
imum Entropy  Method  (MEM)  [2-4,8,9,12].  MEM,  originally  due  to  Jaynes, 
comes  from  the  interaction  of  Information  Theory  with  Statistical  Mechanics.  It 
works  for  underdetermined  systems,  producing  the  smoothest  answer  consistent 
with  the  data.  In  the  case  at  hand,  it  would  produce  network  flows  equalized  for 
whatever  parameter  we  wish  to  smooth:  traffic  by  links,  total  queue  size  along 

The  Internet  is  a  loose  confederation  of  networks  able  to  provide  a  reasonable  degree  of  in- 
teroperability for  users  on  connected  hosts.    See  [5]. 
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paths,  etc.    Since  we  need  no  more  than  that  qualitative  approach  for  our  example, 
we  will  not  present  further  detail  on  Maximum  Entropy  here. 

We  will  form  our  examples  by  taking  static  cascades  of  independent  M/M/1 
queues  as  our  model  of  multi-hop  network  paths,  ignoring  blocking,  dependence  in 
forwarding,  and  many  other  effects.  Most  importantly,  we  will  ignore  the  probable 
failure  of  stochastic  equilibrium  by  using  M/M/1  queuing  models  even  though  we 
vary  the  customer  arrival  rate.  In  a  network  of  large  diameter  and  heavy  traffic,  it 
is  not  unreasonable  to  assume  that  the  time  scales  of  routing  calculations  are  suffi- 
ciently large  to  consider  them  as  leading  to  approximate  equilibrium  queue-by- 
queue.  It  would  be  a  good  idea  to  do  a  more  accurate  non-equilibrium  model,  but 
that  would  take  us  beyond  our  simple  pedagogical  objective  into  material  better 
suited  to  a  research  paper. 


2.   A  Simple  Three  Node  Example 


aX 


Figure  1.   Three  Node  Network  Example  of  Bifurcated  Routing. 
Traffic  flows  from  A  to  B  either  directly  or  via  C. 


We  will  start  with  a  trivial  three-node  network.  Consider  a  simple  example  of 
a  network  consisting  of  three  nodes,  A ,  B ,  C ,  where  A  has  traffic  \  for  B  and  may 
route  it  either  directly  or  via  C .  We  assume  independent  M/M/1  queues  at  each 
node,  and  an  equal  service  rate  \i  for  each  queue.  We  assume  that  A  supplies  a 
Poisson  distributed  stream  of  packets  at  rate  aX  along  the  direct  route  and  (1  —  a)X 
along  the  indirect  route.  Our  routing  decision  is  to  choose  the  "best"  value  of  a. 
Assume  we  seek  to  control  average  delay.   The  average  delay  is 


W  =  a     ^- 

I  |JL  —  aX. 


+  (1  -  a) 


(1  -  a)\ 


This  equation  becomes  clearer  if  we  define  p  =  — ,  use  a  dimensionless  delay  W  \l, 
and  define  ct  =  a  —  — .   Then 


W\y.  = 


(-tj 


2    2 
P   f^ 


and  we  seek  to  minimize  W ii  hy  varying  ct.  In  this  case,  it  is  sufficient  to  look  at 
the  boundary  case  a  =  1  (  ct  =  — -  )  and  at  the  location  of  zeros  of  the  derivative  of 
W|x  with  respect  to  ct. 

The  zeros  of  the  derivative  are  given  by 

CT  =  (3  ±  2  V2)  — 

P 

The  lower  root  is  the  one  in  the  proper  range  (  —.5  ^ct  <.5)  as  long  as  p  >  .293. 
Below  that  we  use  the  direct  route.   Above  that  value  the  routing  bifurcates. 

Consider  a  situation  in  which  the  offered  load  switches  between,  say,  p  =  .3 
and  p  =  .9,  spending  about  half  its  time  at  each  level.  This  might  be  due  to  the 
inherent  characteristics  of  the  applications,  or,  perhaps  due  to  a  periodic  sensing  of 
overload  at  the  higher  load  and  a  backing  off  to  the  light  load  to  relieve  congestion. 
The  average  load  is  then  p  =  .6,  and  we  face  the  choice  of  routing  for  the  instan- 
taneous values  or  for  the  average.  The  "optimum"  values  of  a  for  these  values  of  p 
are: 


5  - 


a 


.3 

.99 

.6 

.70 

.9 

.60 

(given  to  the  nearest  .01).    Consider  the  following  table  of  values  of  Wp,  for  these 
values  of  a  and  for  .5  (the  Maximum  Entropy  value,  see  below). 


a 
.99     .70     .60     .50 


.3  1.43  1.55  1.64  1.76 
.6  2.46  1.94  1.99  2.14 
.9       9.10  2.71   2.55  2.72 


From  this  table,  we  observe  that  the  average  delay  using  the  optimum  a  values  for 
the  p  =  .3  and  p  =  .9  cases  is  Wp,  =  1.99  which  is  slightly  below  the  average  2.13 
of  the  W|JL  values  for  a  =  .7.  We  gained  about  seven  percent  by  using  highly 
dynamic  routing.  In  fact,  if  we  had  used  a  dynamic  shortest  path  routing,  which 
would  have  taken  a  =  1.0  in  all  these  cases,  we  would  have  paid  a  serious  delay 
penalty.  Worse  yet,  suppose  we  had  a  processing  time  for  the  dynamic  algorithm 
comparable  to  the  cycle  time  of  the  load  and  switched  to  the  values  of  a  for  p  =  .3 
just  when  the  load  switched  to  p  =  .9  and  vice-versa.  Then  we  would  have  an  aver- 
age delay  for  dynamic  routing  of  W \l  >  5.3.  (For  p  =  .3  we  would  have  used 
a  =  .6  and  gotten  a  delay  of  Wp,  =  1.64,  while  for  p  =  .9  we  would  have  used 
a  =  .99  and  gotten  a  delay  of  Wp.  =  9.10). 

Of  further  interest  is  the  fact  that  the  Maximum  Entropy  value  of  a  =  .5 
(assuming  we  balance  traffic  by  links,  not  paths)  gives  a  true  average  delay  of 
Wp,  =  2.24  using  the  correct  delay  values  of  W|jl  =  1.76  for  p  =  .3  and  Wp,  =  2.72 
for  p  =  .9,  a  penalty  of  about  fifteen  percent  for  taking  too  low  a  value  of  a. 

If  we  look  at  the  Maximum  Entropy  solution  for  traffic  balanced  by  paths,  i.e. 
making  the  average  queue  on  the  direct  path  equal  to  the  average  queue  on  the 
indirect  path,  we  obtain  bifurcated  routing  for  all  values  of  p: 
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p         a 

0.  .97 

.3  .64 

.6  .62 

.9  .51 

with  an  average  delay  for  our  test  case  using  p  =  .6  for  the  alternating  traffic  of 
2.10,  a  penalty  of  about  five  percent. 

Our  conjecture   is  that  the  optimal  dynamic  routing  solutions  are,  in  many 

cases,  similarly  unstable  under  reasonable  dynamic  load,  and  that  the  Maximum 

Entropy  routings  will  prove  a  more  robust  starting  point  for  distributed  dynamic 
adjustments. 

3.    A  More  General  Case  —  In  homogeneous  Rates,  Many  Paths 


/j-hop  path  of  servers  rate  ixj 


Client  (source) 
Rate  X 


Server 
(destination) 


/„-hop  path  of  servers  rate  \x,„ 

Figure  2.    Inhomogeneous  node-disjoint  paths  of  multi-hop  M/M/1  queues. 

Traffic  a,X  goes  onto  the  ith  path  and  sees  /,• 

hops  of  servers  at  rate  p,,-. 


Tt  may  seem  we  arc  extracting  too  much  from  the  three  node  case.    It  is  more 
realistic  to  consider  a  network  offering  multiple  paths  with  differing  numbers  of 
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hops  of  differing  capacity.  For  reliability,  it  is  desirable  that  separate  routes  share 
as  few  nodes  as  possible.  We  will  restrict  our  attention  to  node-disjoint  paths  in 
which  only  the  origin  and  destination  are  common  to  distinct  paths.  We  also  require 
that  on  any  single  path  the  service  rates  on  all  of  the  links  forming  the  path  be  the 
same,  even  though  links  on  different  paths  may  have  different  service  rates.  We 
contend  that  this  is  not  a  severe  restriction,  since  very  high  capacity  links  in  series 
with  much  lower  capacity  Hnks  will  be  dominated  by  the  bottleneck  formed  by  the 
slower  links,  and  can  be  effectively  ignored  for  our  purposes.  In  Section  4,  below, 
we  will  give  a  conservative  approximation  to  the  case  of  mixed  rates  on  a  single 
path. 

Thus  consider  the  extension  of  this  analysis  to  a  network  with  a  single  client 
offering  traffic  load  X  trying  to  reach  a  server  via  n  node-disjoint  paths,  where  each 
path,  i,  acts  as  a  cascade  of  /,  independent  M/M/1  queues  of  service  rate  jjl,-.  Sup- 
pose the  client  allocates  his  traffic  to  path  i  with  weight  a, ,  providing  Poisson  distri- 
buted traffic  at  rate  a,-  to  the  path.  Then  the  independent  M/M/1  queuing  model 
gives  an  expected  delay  of 

W  =  2 ^^^-^-      .forO<  a,.  <  1,2«,  =  1 

Define 


di  = 


a,/,- 


and  compute  the  partial  derivatives  which  will  be  needed  in  finding  minima  of  W . 


for  J  =  1 ,...,«  —  1  and 


aa, 

(M-,-  -  oCiXf 

dd„ 

\^nL 

da, 

(l^n    -  "nM' 

by  taking  a„  =  1  -  aj  —  ...  —  a„_i,  so  that  the  critical  points  of  W  as  a  function 
of  ttj,  ...,  a„  _  1  occur  when 

0  = 
i.e.  when 


dW 

M-,-/,-                               M-n   In 

aaj 

~   (tJL,  -  a,-\)2         {^„  -  a„X)2 

\^ili                      2 

(jjL.-  -  a.X)2 
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for  some  t  independent  of  i ,  i  =  1,  ...,n .    We  will  solve  these  equations  for  a,-,  but 
first  note  that 


iLi  -  a,\  =  e, 


(ii'ili) 


with  €,•  =  ±1.  The  negative  value  of  e,-  is  "unphysical",  since  that  would  require  an 
overload  of  the  first  queue  on  path  /  by  the  distributed  traffic.  Thus  we  accept  only 
the  positive  roots  and  obtain 


1 


"'=    X 


M-, 


(y^iii) 


1 

2^ 


We  can  solve  for  t  by 


i  =  S«/  =  x2m-.-  -  -^^ii^ih) 


T   = 


so  that 


a.-  =  — 


\^i  -  (Sm-;  -  M 


(M-//,) 


2(M-,/,) 


While  all  the  resulting  values  of  a,-  are  certainly  critical  points  of  W ,  they  may 
not  be  valid  minima.  We  can  eliminate  any  concern  about  convexity  of  the  problem 
by  noting  that 


+ 
(fi,-  -  a,\)^  (fjL„   -  a„X)- 


which,  as  the  sum  of  a  diagonal  matrix  with  positive  terms  and  a  scalar  times  the 
matrix  of  all  ones,  is  positive  definite  as  long  as  we  have  |x,>a,X.  (This  is  only  to 
be  expected  since  this  routing  problem  is  one  of  a  much  wider  class  of  convex 
minimization  problems).  The  real  question  is  whether  the  minima  are  within  the 
region  of  interest,  0  ^  a,  ^  1.  We  may  drive  some  a,-  negative  with  too  small  a 
value  of  X.  This  corresponds  to  the  p  <  .293  cases  in  our  simple  example  above. 
In  that  case,  we  must  reduce  the  allowed  range  of  i  by  dropping  appropriate  paths. 
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To  select  the  paths  to  be  dropped,  order  the  paths  so  that  —r-  is  monotone 

m" 

non-increasing  with  /,  i.e.  so  that  the  path  with  the  fastest  hop-corrected  service  rate 
comes  first  and  the  slowest  path  comes  last.    Compare  X  to 


S^^.- 


l^n 


L 


2(^J^.•^•)■ 


If  \  is  smaller,  drop  path  n  and  recompute  on  the  reduced  network,  since  in  that 
case  a„  will  be  negative.   To  see  this 


0>  a. 


if  and  only  if 


o>ii„-  (Em-;  -  ^) 


if  and  only  if 


2 


<SM-7 


from  which  the  bound  on  X  follows. 

We  can  actually  drop  more  such  lines  at  the  same  time,  since  the  effect  of  tak- 
ing out  lines  with  negative  a  is  to  reduce  load  on  other  lines,  but  we  cannot  assume 
that  the  calculation  need  not  be  repeated  for  the  reduced  set,  since  we  have  no 
assurance  that  more  a  will  not  go  negative  with  this  reduced  load. 

Once  we  enter  a  regime  in  which  the  critical  points  are  indeed  the  minima,  we 
can  compute  the  minimal  W  from 
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4.   Unequal  Service  Rates  on  a  Given  Path 

In  the  previous  section,  we  did  not  use  the  fact  that  the  number  of  hops  was  an 
integer,  just  that  it  was  nonnegative.  Thus  we  may  perform  the  same  analysis  with 
fractional  numbers  of  hops.  This  allows  us  to  make  a  conservative  correction  for 
paths  consisting  of  links  of  different  rates  of  service.  We  certainly  cannot  use  a  rate 
any  higher  than  the  rate  of  the  slowest  link  on  the  path,  for  once  we  hit  the  knee  on 
that  link  the  entire  path  will  block.  However,  estimating  all  links  at  that  lowest 
capacity  gives  unduly  pessimistic  estimates  of  the  response  of  the  path. 

Let  fx,- J  be  the  service  rates  of  the  /,•  links  on  path  i,  reordered  so  that 
l^i,i-i^i,j' J  =  2,...,/,-.   Then  define 


/',•  =  1  +  l^i,i  2 


''       1 


as  the  pseudo-hop  count  to  use  of  links  all  of  rate  ^l^l.    The  difference  between  the 
delay  estimated  on  this  path  with  the  pseudo-hop  count  and  the  real  delay  is 


1  +  M-,,!  2 


'-        1 


1  -    ^'-'^ 


^^^;  J 


a,X 


iJli 'I-  -    y =    V   ^ -^^ >0 

with  equality  at  a,X.  =  0  and  for  equal  service  rates. 

5.  Instability  in  the  General  Case 

We  could  extend  this  example  to  chains  of  M/G/1  queues,  or  even  to  more  gen- 
eral models  of  the  node-disjoint  paths,  but  qualitatively  we  expect  the  same  basic 
behavior.  If  we  do  our  route  planning  for  a  light  load  case,  we  will  tend  to  favor 
the  "shortest"  paths.  If  the  load  then  forces  those  paths  onto  their  delay  curve 
knees,  that  routing  will  be  significantly  worse  than  a  route  plan  which  off-loaded 
some  portion  of  the  excess  load  onto  longer  paths  earlier. 

It  is  tempting  to  think  that  we  can  solve  this  problem  by  responding  to  the  load 
change  quickly  enough.  The  calculation  of  an  optimal  bifurcated  route  for  the 
node-disjoint  M/M/1  cascaded  path  model  is  simple,  requiring  only  accurate  data  on 
hop  counts  and  service  rates.  The  difficulty  lies  in  gathering  the  data,  not  in  using 
it.  If  we  rely  on  multi-hop  distributed  reporting  of  effective  service  rates  and  con- 
nectivity, by  the  time  we  have  it  available  for  use,  it  may  well  be  out  of  date.  At 
the  very  least,  if  we  must  compute  "optimal"  routing,  we  should  do  so  not  for  the 
current  load,  but  for  a  load  which  we  can  reasonably  expect  not  to  exceed  often 
until  the  next  routing  update.  Accumulation  of  variances  of  loads  and  delays  would 
make  such  estimates  feasible. 
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